Use UTF

#Use UTF| 来源: 网络整理| 查看: 265

Use UTF-8 code pages in Windows apps Article 04/20/2022 2 minutes to read

Use UTF-8 character encoding for optimal compatibility between web apps and other *nix-based platforms (Unix, Linux, and variants), minimize localization bugs, and reduce testing overhead.

UTF-8 is the universal code page for internationalization and is able to encode the entire Unicode character set. It is used pervasively on the web, and is the default for *nix-based platforms.

Set a process code page to UTF-8

As of Windows Version 1903 (May 2019 Update), you can use the ActiveCodePage property in the appxmanifest for packaged apps, or the fusion manifest for unpackaged apps, to force a process to use UTF-8 as the process code page.

You can declare this property and target/run on earlier Windows builds, but you must handle legacy code page detection and conversion as usual. With a minimum target version of Windows Version 1903, the process code page will always be UTF-8 so legacy code page detection and conversion can be avoided.

Note

An encoded character takes between 1 and 4 bytes. UTF-8 encoding supports longer byte sequences, up to 6 bytes, but the biggest code point of Unicode 6.0 (U+10FFFF) only takes 4 bytes.

Examples

Appx manifest for a packaged app:

UTF-8

Fusion manifest for an unpackaged Win32 app:

UTF-8

Note

Add a manifest to an existing executable from the command line with mt.exe -manifest -outputresource:;#1

-A vs. -W APIs

Win32 APIs often support both -A and -W variants.

-A variants recognize the ANSI code page configured on the system and support char*, while -W variants operate in UTF-16 and support WCHAR.

Until recently, Windows has emphasized "Unicode" -W variants over -A APIs. However, recent releases have used the ANSI code page and -A APIs as a means to introduce UTF-8 support to apps. If the ANSI code page is configured for UTF-8, -A APIs typically operate in UTF-8. This model has the benefit of supporting existing code built with -A APIs without any code changes.

Code page conversion

As Windows operates natively in UTF-16 (WCHAR), you might need to convert UTF-8 data to UTF-16 (or vice versa) to interoperate with Windows APIs.

MultiByteToWideChar and WideCharToMultiByte let you convert between UTF-8 and UTF-16 (WCHAR) (and other code pages). This is particularly useful when a legacy Win32 API might only understand WCHAR. These functions allow you to convert UTF-8 input to WCHAR to pass into a -W API and then convert any results back if necessary. When using these functions with CodePage set to CP_UTF8, use dwFlags of either 0 or MB_ERR_INVALID_CHARS, otherwise an ERROR_INVALID_FLAGS occurs.

Note

CP_ACP equates to CP_UTF8 only if running on Windows Version 1903 (May 2019 Update) or above and the ActiveCodePage property described above is set to UTF-8. Otherwise, it honors the legacy system code page. We recommend using CP_UTF8 explicitly.

Related topics Code pages Code Page Identifiers

【本文地址】

公司简介

联系我们